<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>



<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="generator" content="Docutils 0.4: http://docutils.sourceforge.net/"><title>Distributed Resource Management Application API</title>

<style type="text/css">

/*
:Authors: Ian Bicking, Michael Foord
:Contact: fuzzyman@voidspace.org.uk
:Date: 2005/08/26 
:Version: 0.1.0
:Copyright: This stylesheet has been placed in the public domain.

Stylesheet for Docutils.
Based on ``blue_box.css`` by Ian Bicking
and ``html4css1.css`` revision 1.46.
*/

@import url(http://rst2a.com/static/styles/html/default.css);

body {
  font-family: Arial, sans-serif;
}

em, i {
  /* Typically serif fonts have much nicer italics */
  font-family: Times New Roman, Times, serif;
}

a.target {
  color: blue;
}

a.target {
  color: blue;
}

a.toc-backref {
  text-decoration: none;
  color: black;
}

a.toc-backref:hover {
  background-color: inherit;
}

a:hover {
  background-color: #cccccc;
}

div.attention, div.caution, div.danger, div.error, div.hint,
div.important, div.note, div.tip, div.warning {
  background-color: #cccccc;
  padding: 3px;
  width: 80%;
}

div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title  {
  text-align: center;
  background-color: #999999;
  display: block;
  margin: 0;
}

div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title {
  color: #cc0000;
  font-family: sans-serif;
  text-align: center;
  background-color: #999999;
  display: block;
  margin: 0;
}

h1, h2, h3, h4, h5, h6 {
  font-family: Helvetica, Arial, sans-serif;
  border: thin solid black;
  /* This makes the borders rounded on Mozilla, which pleases me */
  -moz-border-radius: 8px;
  padding: 4px;
}

h1 {
  background-color: #444499;
  color: #ffffff;
  border: medium solid black;
}

h1 a.toc-backref, h2 a.toc-backref { 
  color: #ffffff;
}

h2 {
  background-color: #666666;
  color: #ffffff;
  border: medium solid black;
}

h3, h4, h5, h6 {
  background-color: #cccccc;
  color: #000000;
}

h3 a.toc-backref, h4 a.toc-backref, h5 a.toc-backref, 
h6 a.toc-backref { 
  color: #000000;
}

h1.title {
  text-align: center;
  background-color: #444499;
  color: #eeeeee;
  border: thick solid black;
  -moz-border-radius: 20px;
}

table.footnote {
  padding-left: 0.5ex;
}

table.citation {
  padding-left: 0.5ex
}

pre.literal-block, pre.doctest-block {
  border: thin black solid;
  padding: 5px;
}

.image img { border-style : solid;
            border-width : 2px;
}

h1 tt, h2 tt, h3 tt, h4 tt, h5 tt, h6 tt {
  font-size: 100%;
}

code, tt {
  color: #000066;
}

</style></head><body>
<div class="document" id="distributed-resource-management-application-api">
<h1 class="title">Distributed Resource Management Application API</h1>
<p>This guide is a tutorial for getting started programming with
DRMAA. It is basically a one to one translation of the
<a class="reference" href="http://gridengine.sunsource.net/project/gridengine/howto/drmaa.html">original in C</a>.
It assumes that you already know what DRMAA is and know how DRMAA is
supported in the Grid Engine 6.0 release. If you do not already know
these things, try these web sites:</p>
<blockquote>
<ul class="simple">
<li><a class="reference" href="http://www.drmaa.org/">The DRMAA Website</a></li>
<li><a class="reference" href="http://gridengine.sunsource.net/source/browse/gridengine/doc/README-DRMAA.txt">The Grid Engine 6.0 DRMAA README</a></li>
<li><a class="reference" href="http://gridengine.sunsource.net/unbranded-source/browse/%7Echeckout%7E/gridengine/source/libs/japi/drmaa.html?content-type=text/html">The Grid Engine libdrmaa docs</a></li>
<li><a class="reference" href="http://gridengine.sunsource.net/unbranded-source/browse/%7Echeckout%7E/gridengine/source/libs/japi/japi.html?content-type=text/html">The Grid Engine libjapi docs</a></li>
<li><a class="reference" href="http://gridengine.sunsource.net/project/gridengine/howto/filestaging/filestaging6.html">DRMAA synchronized file staging</a></li>
</ul>
</blockquote>
<p>This tutorial assumes that you have <a class="reference" href="http://gridengine.sunsource.net/files/documents/7/36/DRMAA-python-0.2.tar.gz">DRMAA-python</a> installed.
The original file was developed by Enrico Sirola, with modifications
in 2007 by Cheng Soon Ong</p>
<div class="section">
<h1><a id="starting-and-stopping-a-session" name="starting-and-stopping-a-session">Starting and Stopping a Session</a></h1>
<p>The following code segment shows the most basic DRMAA python binding
program</p>
<p>example1:</p>
<pre class="literal-block"> 1  #!/usr/bin/env python
 2
 3  import DRMAA
 4
 5  def main():
 6      """Create a DRMAA session and exit"""
 7      s=DRMAA.Session()
 8      print 'A DRMAA object was created'
 9      s.init()
10      print 'A session was started successfully'
11      s.exit()
12
13  if __name__=='__main__':
14      main()
</pre>
<p>The first thing to notice is that every call to a DRMAA function will
return an error code. In this tutorial, we ignore all error codes.</p>
<p>Now let's look at the functions being called. First, on line 7, we
initialise a Session object by calling DRMAA.Session(). Then we
initialise the session in line 9 by init()
This function sets up the DRMAA session and must be
called before most other DRMAA functions. Some functions, like
getContact(), can be called before init(), but these
functions only provide general information. Any function that does
work, such as runJob() or wait() must be called after
init() returns. If such a function is called before init()
returns, it will return an error.</p>
<p>init() creates a session and starts an event client listener
thread. The session is used for organizing jobs submitted through
DRMAA, and the thread is used to receive updates from the queue master
about the state of jobs and the system in general. Once init()
has been called successfully, it is the responsibility of the calling
application to also call exit() before terminating. If an
application does not call exit() before terminating, session
state may be left behind in the user's home directory (under
.sge/drmaa), and the queue master may be left with a dead event client
handle, which can decrease queue master performance.</p>
<p>At the end of our program, on line 11, we call
exit(). exit() cleans up the session and stops the event
client listener thread. Most other DRMAA functions must be called
before exit(). Some functions, like getContact(), can be
called after exit(), but these functions only provide general
information. Any function that does work, such as runJob() or
wait() must be called before exit() is called. If such a
function is called after exit() is called, it will return an error.</p>
<p>example1.1:</p>
<pre class="literal-block"> 1  #!/usr/bin/env python
 2
 3  import DRMAA
 4
 5  def main():
 6      """Create a session, show that each session has an id,
 7      use session id to disconnect, then reconnect. Then exit"""
 8      s=DRMAA.Session()
 9      s.init()
10      print 'A session was started successfullly'
11      response = s.getContact()
12      print 'getContact() returns: ' + response
13      s.exit()
14      print 'Exited from session'
15
16      s.init(response)
17      print 'Session was restarted successfullly'
18      s.exit()
19
20
21  if __name__=='__main__':
22      main()
</pre>
<p>This example is very similar to Example 1. The difference is that it
uses the Grid Engine feature of reconnectable sessions. The DRMAA
concept of a session is translated into a session tag in the Grid
Engine job structure. That means that every job knows to which
session it belongs. With reconnectable sessions, it's possible to
initialize the DRMAA library to a previous session, allowing the
library access to that session's job list. The only limitation,
though, is that jobs which end between the calls to exit() and init()
will be lost, as the reconnecting session will no longer see these
jobs, and so won't know about them.</p>
<p>Through line 10, this example is very similar to Example 1. On line
11, however, we use the getContact() function to get the
contact information for this session. On line 13 we then exit the
session. On line 16, we use the stored contact information to
reconnect to the previous session. Had we submitted jobs before
calling exit(), those jobs would now be available again for operations
such as wait() and synchronize(). Finally, on line 18 we
exit the session a second time.</p>
</div>
<div class="section">
<h1><a id="running-a-job" name="running-a-job">Running a Job</a></h1>
<p>The following code segment shows how to use the DRMAA python binding
to submit a job to Grid Engine:</p>
<p>example2:</p>
<pre class="literal-block"> 1  #!/usr/bin/env python
 2
 3  import DRMAA
 4  import os
 5
 6  def main():
 7      """Submit a job.
 8      Note, need file called sleeper.sh in home directory. An example:
 9      echo 'Hello World $1'
10      """
11      s=DRMAA.Session()
12      s.init()
13
14      print 'Creating job template'
15      jt = s.createJobTemplate()
16      jt.remoteCommand = os.getcwd() + '/sleeper.sh'
17      jt.args = ['42','Simon says:']
18      jt.joinFiles=True
19      jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/DRMAA_JOB_OUT'
20
21      jobid = s.runJob(jt)
22      print 'Your job has been submitted with id ' + jobid
23
24      print 'Cleaning up'
25      s.deleteJobTemplate(jt)
26      s.exit()
27
28  if __name__=='__main__':
29      main()
</pre>
<p>The beginning and end of this program are the same as the previous
one. What's different is in lines 15-25. On line 15 we ask DRMAA to
allocate a job template for us. A job template is a structure used to
store information about a job to be submitted. The same template can
be reused for multiple calls to runJob() or
runBulkJob().</p>
<p>On line 16 we set the REMOTE_COMMAND attribute. This attribute
tells DRMAA where to find the program we want to run. Its value is the
path to the executable. The path be be either relative or absolute. If
relative, it is relative to the WD attribute, which if not set
defaults to the user's home directory. For more information on DRMAA
attributes, please see the attributes man page. Note that for
this program to work, the script "sleeper.sh" must be in the current
directory.</p>
<p>On line 17 we set the V_ARGV attribute. This attribute tells
DRMAA what arguments to pass to the executable. For more information
on DRMAA attributes, please see the attributes man page.</p>
<p>On line 21 we submit the job with runJob(). DRMAA will place
the id assigned to the job into the character array we passed to
runJob(). The job is now running as though submitted by
qsub. At this point calling exit() and/or terminating the
program will have no effect on the job.</p>
<p>To clean things up, we delete the job template on line 25. This frees
the memory DRMAA set aside for the job template, but has no effect on
submitted jobs. Finally, on line 26, we call exit().</p>
<p>If instead of a single job we had wanted to submit an array job, we
could have replaced the else on line 21 and 22 with the following:</p>
<p>example2.1:</p>
<pre class="literal-block">21      jobid = s.runBulkJobs(jt,1,30,2)
22      print 'Your job has been submitted with id ' + str(jobid)
</pre>
<p>This code segment submits an array job with 15 tasks numbered 1, 3, 5,
7, etc. An important difference to note is that runBulkJobs()
returns the job ids in an array. On line 22, we print all
the job ids.</p>
</div>
<div class="section">
<h1><a id="waiting-for-a-job" name="waiting-for-a-job">Waiting for a Job</a></h1>
<p>Now we're going to extend our example to include waiting for a job to finish.</p>
<p>example3:</p>
<pre class="literal-block"> 1  #!/usr/bin/env python
 2
 3  import DRMAA
 4  import os
 5
 6  def main():
 7      """Submit a job and wait for it to finish.
 8      Note, need file called sleeper.sh in home directory. An example:
 9      echo 'Hello World $1'
10      """
11      s=DRMAA.Session()
12      s.init()
13
14      print 'Creating job template'
15      jt = s.createJobTemplate()
16      jt.remoteCommand = os.getcwd() + '/sleeper.sh'
17      jt.args = ['42','Simon says:']
18      jt.joinFiles=True
19      jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/DRMAA_JOB_OUT'
20
21      jobid = s.runJob(jt)
22      print 'Your job has been submitted with id ' + jobid
23
24      retval = s.wait(jobid, DRMAA.Session.TIMEOUT_WAIT_FOREVER)
25      print 'Job: ' + str(retval.getJobId()) + ' finished with code ' + str(retval.getExitStatus())
26
27      print 'Cleaning up'
28      s.deleteJobTemplate(jt)
29      s.exit()
30
31  if __name__=='__main__':
32      main()
</pre>
<p>This example is very similar to Example 2 except for line 24. On
line 24 we call wait() to wait for the job to end. We have to
give wait() both the id of the job for which we want to wait and
a place to write the id of the job for which we actually waited
because the job id we pass in could be JOB_IDS_SESSION_ANY, in
which case wait() must have a way of tell us which job is the
one that made it return. We also have to pass to wait() how long
we are willing to wait for the job to finish. This could be a number
of seconds, or it could be either TIMEOUT_WAIT_FOREVER or
TIMEOUT_NO_WAIT. Lastly, we
also have to pass in a place to write the exit status and the usage
information. The exit status is an opaque number that is passed to the
w...() functions to get information about how the job
exited. The usage information is a list of name=value pairs in a DRMAA
values structure. The values structure works exactly the same as the
ids structure we talked about in Example 2.1.
Assuming the wait worked, we query the job's exit status on lines
25.</p>
<p>An alternative to wait() when working with multiple jobs, such
as jobs submitted by runBulkJobs() or multiple calls to
runJob() is synchronize(). synchronize() waits for
a set of jobs to finish. To use synchronize(), we could replace
lines 21-24 with the following:</p>
<p>example3.1:</p>
<pre class="literal-block">21      joblist = s.runBulkJobs(jt,1,30,2)
22      print 'Your job has been submitted with id ' + str(joblist)
23
24      s.synchronize(joblist, DRMAA.Session.TIMEOUT_WAIT_FOREVER, True)
</pre>
<p>Line 21 now call runBulkJobs() so that we have several
jobs for which to wait. On line 56, instead of calling wait(),
we call synchronize(). synchronize() takes only three
iteresting parameters. The first is the list of ids for which to
wait. This list must be a NULL-terminated array of strings. If the
special id, JOB_IDS_SESSION_ALL, appears in the array,
synchronize() will wait for all jobs submitted via DRMAA during
this session, i.e. since init() was called. The second is how
long to wait for all the jobs in the list to finish. This is the same
as the timeout parameter for wait(). The third is whether this
call to synchronize() should clean up after the job. After a job
completes, it leaves behind accounting information, such as exist
status and usage, until either wait() or synchronize()
with dispose set to true is called. It is the responsibility of the
application to make sure one of these two functions is called for
every job. Not doing so creates a memory leak. Note that calling
synchronize() with dispose set to true flushes all accounting
information for all jobs in the list. If you want to use
synchronize() and still recover the accounting information, set
dispose to false and call wait() for each job. To do this in
Example 3, we would replace lines 21-24 with the following:</p>
<p>example3.2:</p>
<pre class="literal-block">26      s.synchronize(joblist, DRMAA.Session.TIMEOUT_WAIT_FOREVER,False)
27      for curjob in joblist:
28          print 'Collecting job ' + curjob
29          retval = s.wait(curjob, DRMAA.Session.TIMEOUT_WAIT_FOREVER)
30          print 'Job: ' + str(retval.getJobId()) + ' finished with code ' + str(retval.getExitStatus())
</pre>
<p>What's different is that on line 26, we set dispose to false, and then
on lines 27-30 we wait once for each job, printing the exit status
and usage information as we did in Example 3. We pass
JOB_IDS_SESSION_ANY to wait() as the job id because we
already know that all the jobs have finished, so we don't really care
in what order we process them. In an interactive system where we
couldn't guarantee that more jobs wouldn't be submitted between the
synchronize and the wait, we would have to store the job ids from the
runBulkJobs() in an array and then wait for each job
specifically. Otherwise, the wait() could end up waiting for a
job submitted after the call to synchronize().</p>
</div>
<div class="section">
<h1><a id="controling-a-job" name="controling-a-job">Controling a Job</a></h1>
<p>Now let's look at an example of how to control a job from DRMAA:</p>
<p>example4:</p>
<pre class="literal-block"> 1  #!/usr/bin/env python
 2
 3  import DRMAA
 4  import os
 5
 6  def main():
 7      """Submit a job, then kill it.
 8      Note, need file called sleeper.sh in home directory. An example:
 9      echo 'Hello World $1'
10      """
11      s=DRMAA.Session()
12      s.init()
13
14      print 'Creating job template'
15      jt = s.createJobTemplate()
16      jt.remoteCommand = os.getcwd() + '/sleeper.sh'
17      jt.args = ['42','Simon says:']
18      jt.joinFiles=True
19      jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/JOB_OUT'
20
21      jobid = s.runJob(jt)
22      print 'Your job has been submitted with id ' + jobid
23      # options are: SUSPEND, RESUME, HOLD, RELEASE, TERMINATE
24      s.control(jobid,DRMAA.Session.TERMINATE)
25
26      print 'Cleaning up'
27      s.deleteJobTemplate(jt)
28      s.exit()
29
30  if __name__=='__main__':
31      main()
</pre>
<p>This example is very similar to Example 2 except for line 24. On
line 24 we use control() to delete the job we just
submitted. Aside from deleting the job, we could have also used
control() to suspend, resume, hold, or release it. For more
information, see the control  man page.</p>
<p>Note that control() can be used to control jobs not submitted
through DRMAA. Any valid SGE job id could be passed to control()
as the id of the job to delete.</p>
</div>
<div class="section">
<h1><a id="getting-job-status" name="getting-job-status">Getting Job Status</a></h1>
<p>Here's an example of using DRMAA to query the status of a job:</p>
<p>example5:</p>
<pre class="literal-block"> 1  #!/usr/bin/env python
 2
 3  import DRMAA
 4  import time
 5  import os
 6
 7  def main():
 8      """Submit a job, and check its progress.
 9      Note, need file called sleeper.sh in home directory. An example:
10      echo 'Hello World $1'
11      sleep 30s
12      """
13      s=DRMAA.Session()
14      s.init()
15
16      print 'Creating job template'
17      jt = s.createJobTemplate()
18      jt.remoteCommand = os.getcwd() + '/sleeper.sh'
19      jt.args = ['42','Simon says:']
20      jt.joinFiles=True
21      jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/JOB_OUT'
22
23      jobid = s.runJob(jt)
24      print 'Your job has been submitted with id ' + jobid
25
26      # Who needs a case statement when you have dictionaries?
27      decodestatus = {
28          DRMAA.Session.UNDETERMINED: 'process status cannot be determined',
29          DRMAA.Session.QUEUED_ACTIVE: 'job is queued and active',
30          DRMAA.Session.SYSTEM_ON_HOLD: 'job is queued and in system hold',
31          DRMAA.Session.USER_ON_HOLD: 'job is queued and in user hold',
32          DRMAA.Session.USER_SYSTEM_ON_HOLD: 'job is queued and in user and system hold',
33          DRMAA.Session.RUNNING: 'job is running',
34          DRMAA.Session.SYSTEM_SUSPENDED: 'job is system suspended',
35          DRMAA.Session.USER_SUSPENDED: 'job is user suspended',
36          DRMAA.Session.DONE: 'job finished normally',
37          DRMAA.Session.FAILED: 'job finished, but failed',
38          }
39
40      for ix in range(10):
41          print 'Checking ' + str(ix) + ' of 10 times'
42          status = s.getJobProgramStatus(jobid)
43          print decodestatus.get(status)
44          time.sleep(5)
45
46      print 'Cleaning up'
47      s.deleteJobTemplate(jt)
48      s.exit()
49
50  if __name__=='__main__':
51      main()
</pre>
<p>Again, this example is very similar to Example 2, this time with the
exception of lines 26-44. First, after submitting the job, we sleep
for 20 seconds to give SGE time to schedule the job. Then, on line 42,
we use getJobProgramStatus() to get the status of the job. Line 43
determine what the job status is and report it.</p>
</div>
<div class="section">
<h1><a id="getting-drm-information" name="getting-drm-information">Getting DRM information</a></h1>
<p>Lastly, let's look at how to query the DRMAA library for information
about the DRMS and the DRMAA implementation itself:</p>
<p>example6:</p>
<pre class="literal-block"> 1  #!/usr/bin/env python
 2
 3  import DRMAA
 4
 5  def main():
 6      """Query the system."""
 7      s=DRMAA.Session()
 8      print 'A DRMAA object was created'
 9      print 'Supported contact strings: ' + s.getContact()
10      print 'Supported DRM systems: ' + str(s.getDRMSInfo())
11      print 'Supported DRMAA implementations: ' + str(s.getDRMAAImplementation())
12
13
14      s.init()
15      print 'A session was started successfully'
16      print 'Supported contact strings: ' + s.getContact()
17      print 'Supported DRM systems: ' + str(s.getDRMSInfo())
18      print 'Supported DRMAA implementations: ' + str(s.getDRMAAImplementation())
19      print 'Version ' + str(s.getVersion())
20
21      print 'Exiting'
22      s.exit()
23
24
25  if __name__=='__main__':
26      main()
</pre>
<p>On line 9, we get the contact string list. This is the list of
contact strings that will be understood by this DRMAA
instance. Normally on of these strings is used to select to which DRM
this DRMAA instance should be bound. In the Grid Engine 6.0
implementation, the contact string list is empty because there is only
ever one possible DRM to which to bind. On line 10, we get the list of
supported DRM systems. For the Grid Engine 6.0 implementation, this
will always be Grid Engine 6.0. On line 11, we get the list of
supported DRMAA implementations. For the Grid Engine 6.0
implementation, this will always be Grid Engine 6.0. On line 14, we
call init(). After init() has been called, the
get_contact() and get_DRM_system() calls change. On line
16, we call get_contact() again, this time to get the contact
string that was used to bind to a DRMS in init(). For the Grid
Engine 6.0 implementation, this will always be an empty string. On
line 17, we call getDRMSInfo() again, this time to get the
name of the DRMS to which DRMAA is bound. For the Grid Engine 6.0
implementation, this will always be Grid Engine 6.0. On line 18, we
call get_implementation() again, this time to get the name
of the DRMAA implementation to which the application is bound. For the
Grid Engine 6.0 implementation, this will always be Grid Engine
6.0. On line 19, we get the version number of the DRMAA C binding
specification supported by this DRMAA implementation. For the Grid
Engine 6.0 implementation this is currently version 0.8. Finally, on
line 22, we end the session with exit().</p>
</div>
</div>
</body></html>